A very brief..
Intro to R





Mladen Čučak

mladencucak@gmail.com

Topics

  • About R/RStudio
  • Basics of programming with R
  • Data analysis with tidyverse






These materials are based on the APS's “R for Plant Pathologists”, a more comprehensive workshop available here

Why R


  • Performance: stable, light and fast

  • Support network: documentation, community, developers

  • Reproducibility: anyone anywhere can reproduce results

  • Versatility: unified solution to almost any numerical problem and graphical capabilities

  • Ethics: accessible to anyone as it is free and open source

Be strong!

Transition from “point and click” is tough but rewarding

Baby steps

Help:

  • Google: just add “with R” at the end of any search
  • Stack Overflow: programming questions
  • Cross Validated: scientific questions

Learning:

Stay focused! Don’t get overwhelmed!

Your new best friends



R – programming language for statistical computing, data manipulation, and graphics


alt text /http://www.r-project.org/



RStudio – Integrated Development Environment (IDE) makes our life much easier

alt text https://rstudio.com/

How is that?



R – Engine

alt text


RStudio – Dashboard

alt text

R interface

is not the friendliest one

RStudio (IDE)

Move onto some coding

  • Move the cursor onto a line with R code and pres (Win)Ctrl + Enter or (MAC)Cmd + Return.

    Challenge: Do it with one hand you are not using to hold the mouse!

Tip for later:

  • To run an entire script (Win)Ctrl + Shift + Enter or (MAC)Cmd + Shift + Return
  • Many other keyboard shortcuts in RStudio (Win)Alt+Shift+K or (MAC)Option+Shift+K

R basics: In R, we have...

Objects, where the data is stored.

x <- 1
y <- 2
x + y
[1] 3

the same result if:

1+3
[1] 4

R basics: In R, we have...

Objects, where the data is stored. Data is assigned using <-

x <- 1
y <- 2
x + y
[1] 3

the same result if:

1+3
[1] 4

Functions are applied on these objects to analyze the data.

# I am a comment!!! Just here to help jog the memory later on...
# Let us make a function!
addition <- function(argument_one,
                     argument_two){ 
  argument_one + argument_two # operations
} # curly brackets define operations

ls() # check content of the environment
[1] "addition" "x"        "y"       
addition(argument_one = x,
         argument_two = y)
[1] 3

R basics: In R, we have...

Objects, where the data is stored.
Data is assigned using <-

x <- 1
y <- 2
x + y
[1] 3

the same result if:

1+3
[1] 4

Functions which are applied on objects (i.e. to analyze the data)

addition <- function(argument_one, argument_two){ 
  argument_one + argument_two 
} 
addition(argument_one = x,
         argument_two = y)
[1] 3
# Notice the difference?!
addition(x, y)
[1] 3
addition(x, y) == x+y #notice double "="
[1] TRUE
all.equal(addition(x, y), x+y) #Same as above, but pre-made
[1] TRUE

Objects: Vectors

Vectors store data of the same type
(a column of an excel table)

Types of data:

num <- c(50, 60, 65) 

char <- c("mouse", "rat", "dog") 

fct <- factor("low", "med", "high")

dates <- as.Date(c("02/27/92", "02/27/92", "01/14/92"), "%m/%d/%y")

logical <-  c(FALSE, FALSE, TRUE) # only TRUE or FALSE

Objects: Vectors

Vectors store data of the same type
(a column of an excel table)

Types of data:

num <- c(50, 60, 65) 

char <- c("mouse", "rat", "dog") 

fct <- factor("low", "med", "high")

dates <- as.Date(c("02/27/92", "02/27/92", "01/14/92"), "%m/%d/%y")

logical <-  c(FALSE, FALSE, TRUE) # only TRUE or FALSE

Subsetting and Indexing

num[1] # 1st element
[1] 50
num[num >= 60] # More than or equal
[1] 60 65
char == "dog" # see logical on the left
[1] FALSE FALSE  TRUE
char[logical]
[1] "dog"
char[char == "dog"]
[1] "dog"

Objects: Dataframes

Dataframe is a set of vectors of same length(an entire excel table)

Creating data frames

df <- data.frame(
  col_one = num,
  col_two = char
)
print(df)
  col_one col_two
1      50   mouse
2      60     rat
3      65     dog
head(df,1)
  col_one col_two
1      50   mouse

Same logic for indexing, just in 2 dimensions

df[1, 1] # [rows, columns]
[1] 50
df[, 1] # 1st column in the data frame
[1] 50 60 65
df[, -2] # Exclude 2nd column
[1] 50 60 65
df[2:3, "col_two"] 
[1] "rat" "dog"
df$col_two
[1] "mouse" "rat"   "dog"  

R packages

Pre-made set of functions for common (and not so common) tasks

Data analysis with tidyverse

Another level: A package of packages
Think something like Microsoft Office suite